Integer Linear Programming Approach to Median and Center Strings for a Probability Distribution on a Set of Strings

نویسندگان

  • Morihiro Hayashida
  • Hitoshi Koyano
چکیده

For a data set composed of numbers or numerical vectors, a mean is the most fundamental measure for capturing the center of the data. However, for a data set of strings, a mean of the data cannot be defined, and therefore, median and center strings are frequently used as a measure of the center of the data. In contrast to calculating a mean of numerical data, constructing median and center strings of string data is not easy, and no algorithm is found that is guaranteed to construct exact solutions of center strings. In this study, we first generalize the definitions of median and center strings of string data into those of a probability distribution on a set of all strings composed of letters in a given alphabet. This generalization corresponds to that of a mean of numerical data into an expected value of a probability distribution on a set of numbers or numerical vectors. Next, we develop methods for constructing exact solutions of median and center strings for a probability distribution on a set of strings, applying integer linear programming. These methods are improved into faster ones by using the triangle inequality on the Levenshtein distance in the case where a set of strings is a metric space with the Levenshtein distance. Date: January 12, 2017 (Thursday) Time: 11:00am – 12:00noon Place: Room 309, Run Run Shaw Bldg., HKU

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of Random Biased d-ary Tries Model

Tries are the most popular data structure on strings. We can construct d-ary tries by using strings over an alphabet leading to d-ary tries. Throughout the paper we assume that strings stored in trie are generated by an appropriate memory less source. In this paper, with a special combinatorial approach we extend their analysis for average profiles to d-ary tries. We use this combinatorial appr...

متن کامل

A Non-linear Integer Bi-level Programming Model for Competitive Facility Location of Distribution Centers

The facility location problem is a strategic decision-making for a supply chain, which determines the profitability and sustainability of its components. This paper deals with a scenario where two supply chains, consisting of a producer, a number of distribution centers and several retailers provided with similar products, compete to maintain their market shares by opening new distribution cent...

متن کامل

A Chance Constrained Integer Programming Model for Open Pit Long-Term Production Planning

The mine production planning defines a sequence of block extraction to obtain the highest NPV under a number of constraints. Mathematical programming has become a widespread approach to optimize production planning, for open pit mines since the 1960s. However, the previous and existing models are found to be limited in their ability to explicitly incorporate the ore grade uncertainty into the p...

متن کامل

Probabilistic analysis of the asymmetric digital search trees

In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...

متن کامل

A mixed integer linear programming formulation for a multi-stage, multi-Product, multi-vehicle aggregate production-distribution planning problem

In today’s competitive market place, companies seek an efficient structure of supply chain so as to provide customers with highest value and achieve competitive advantage. This requires a broader perspective than just the borders of an individual company during a supply chain. This paper investigates an aggregate production planning problem integrated with distribution issues in a supply chain ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016